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Abstract 



We consider the problem of jointly optimum modulation and estimation of a real- 
valued random parameter, conveyed over an additive white Gaussian noise (AWGN) 
(j ' channel, where the performance metric is the large deviations behavior of the estima- 

tor, namely, the exponential decay rate (as a function of the observation time) of the 
probability that the estimation error would exceed a certain threshold. Our basic result 
^ , is in providing an exact characterization of the fastest achievable exponential decay 

OO ' rate, among all possible modulator-estimator (transmitter-receiver) pairs, where the 

•^ , modulator is limited only in the signal power, but not in bandwidth. This exponential 

rate turns out to be given by the reliability function of the AWGN channel. We also 
discuss several ways to achieve this optimum performance, and one of them is based on 
quantization of the parameter, followed by optimum channel coding and modulation, 
which gives rise to a separation-based transmitter, if one views this setting from the 
perspective of joint source-channel coding. This is in spite of the fact that, in general, 
when error exponents are considered, the source-channel separation theorem does not 
hold true. We also discuss several observations, modifications and extensions of this 

j^ , result in several directions, including other channels, and the case of multidimensional 

Jh ' parameter vectors. One of our findings concerning the latter, is that there is an abrupt 

- - - threshold effect in the dimensionality of the parameter vector: below a certain critical 

dimension, the probability of excess estimation error may still decay exponentially, but 
beyond this value, it must converge to unity. 

Index Terms: Parameter estimation, modulation, AWGN, threshold effect, large de- 
viations, reliability function, error exponents. 



1 Introduction 

The rich hterature on parameter estimation includes a large variety of Bayesian and non- 
Bayesian lower bounds on the mean square error (MSE) in estimating parameters from 
signals corrupted by an additive white Gaussian noise (AWGN) channel, as well as other 
channels (see, e.g., the introductions of [1], [2], [20] for overviews on these bounds). Most of 
these bounds are amenable to calculation for a given form of dependence of the transmitted 
signal upon the parameter, i.e., a given modulator, and therefore they may give insights 
concerning optimum estimation for this specific modulator. They may not, however, lend 
themselves easily to the derivation of universal lower bounds, namely, lower bounds that 
depend neither on the modulator nor on the estimator, which are relevant when both 
optimum modulators and optimum estimators are sought. Two exceptions to this rule 
(although usually, not presented as such) are families of bounds that stem from generalized 
data processing theorems (DPT's) [14], [26], [28], and bounds based on hypothesis testing 
considerations [3], [27]. 

Consider, for a example, a random parameter U, uniformly distributed across the unit 
interval, which is to be conveyed across the AWGN channel with spectral density Nq/2, 
transmission power S, and no bandwidth limitation. Using the classical DPT, one views 
the random parameter U as a "source" and the MSE of an arbitrary estimator, E{U —U)'^, 
as the average distortion D, and then derives a lower bound on D from the inequality 
R{D) < CT, where R{D) is the rate-distortion function of U, T is the transmission time, 
and C is the channel capacity, which for the AWGN with unlimited bandwidth, is given by 
C = S/Nq. Now, R{D) is not known to have a closed-form expression in this case, but it 
can be further lower bounded by the Shannon lower bound (see, e.g., [9, Sect. 4.6, p. 101]): 

R{D) > h{U) - - ln{2TTeD) = -- ln(27reD), (1) 

where h{U) = is the differential entropy of U. This readily leads to the universal lower 
bound E{U — U)'^ > ^e~'^^'^ = 2^6*^^^^^^") where £ = ST is the signal energy. It turns 
out that this lower bound is not tight. In [26], it was shown that DPT's pertaining to 
generalized information measures, yield a tighter universal lower bound that decays (as 
T —7- oo) like e~'-'^ . In [14], this bound was further improved, by another generalized DPT, 
to behave like e-2CT/3^ ^^^^ then yet further improved to e"*-"-^'^, using a universal lower 
bound based on signal detection considerations, in the spirit of the Ziv-Zakai bound [27] 



and the Chazan-Zakai-Ziv bound [3]. 

Concerning upper bounds, it turns out that it is possible to achieve an MSE with an 
exponential decay rate of the order of e~'~"'^'^, which is quite close to the latter lower 
bound, but there is still some gap. As is shown in [21, Chap. 8], by using frequency position 
modulation (FPM) with central frequency and bandwidth that both grow like e^-^, where 
i? > is a fixed design parameter, the MSE of the maximum likelihood (ML) estimator turns 
out to be composed of two terms: a "small-error" term (or the "weak noise performance" 
in the terminology of [21]), that behaves essentially like the Cramer~Rao bound, and which 
is proportional to e"^-^-^, and an anomalous error term (gross error due to the threshold 
effect) of the exponential order of q-^{RW ^ where E{R) is the reliability function of the 

AWGN, given by 

{ c D n<'/?'C — 

The optimum trade-off between these two terms is achieved for R = C/6, where they 
have the same exponential rate, e~'-'"^'^ (see also [13]). Similar things can be said about 
pulse position modulation (PPM) with exponentially growing bandwidth [13]. Yet another 
modulation scheme is based on simply quantizing the parameter U into one of M = e^'^ /2 
evenly spaced points in its interval (which are then far apart by 2e~^'^) and then assigning, 
to each one of these points, one out of M orthogonal signals with energy £ (see [15] for an 
analogue for the Poisson channel). Here, the MSE has the same two exponential terms as 
before, but now the first term, e~^^'^, is the contribution of the quantizer to the MSE and 
the second term, e~ '^ , is the contribution of channel decoding errors. 

The quest for closing (or at least, further reducing) the gap between the best known 
lower bound, e"*^^'^, and the upper bound, e~'-'"^'^, remains unsatisfied at present. This 
challenge has, unfortunately, defied our best efforts thus far. We conjecture that it is the 
lower bound that is to be "blamed" for this gap, i.e., we believe that the above-mentioned 
modulation schemes are essentially optimal but there is room for further improvement of 
the lower bound that has not been exploited yet. 

In this paper, instead of focusing on the MSE as our performance metric, we adopt a large 
deviations performance metric: We seek optimum modulation and estimation schemes in 
the sense of maximizing the exponential rate of decay of the probability that the estimation 
error \U — U\ would exceed a given threshold. Motivated by the above discussion, we can 
afford to set this threshold to be exponentially decaying with T, i.e., e"^"^, where i? > is a 



parameter whose value can be chosen freely in some range. More precisely, our asymptotic 
figure of merit for modulation-estimation is 



E*{R) = limsup 

T-5>oo 



-;^loginfPr{|t/- t/| > e"-^'^} 



(3) 



where the infimum is over all modulator-estimator pairs with power S, and where we 
remind the reader that limsupy^^^^ /(T), for a continuous-valued variable T (as opposed 
to a sequence {T„}), is defined as limT->oo supj./>2^ f(T'). 

Our basic result (asserted and proved in Section 2) is that the limsup in eq. (3) is 
equal to the corresponding liminf (and hence can be replaced by lim) and their common 
value has an exact characterization given by E*{R) = E{R), where E{R) is as in (2). 
All three modulation schemes mentioned above, together with ML estimation, achieve this 
performance and hence are asymptotically optimum in the above sense. ^ 

Beyond the fact that the large deviations performance metric has already been addressed 
in estimation theory (see, e.g., [10], [11, p. 4], [16], [18, p. 54], [24, eq. (32)], [27, Sect. IV]), 
a little thought suggests that it is actually natural in this particular setting of wide-band 
waveform communication, which exhibits threshold effects and anomalies. The reason is 
that it makes a clear distinction between 'small' errors, of the order of e~ ("allowed" 
under this metric), and gross errors, whose probabilistic weight is e~^^^''^ at best.^ A 
distinction in the same spirit (but not quite the same) was offered also in [21, Sect. 8.4], 
where it was shown that a non-anomalous MSE of about e~ is the best that can be 
achieved (and again, by the same schemes) under the constraint that the probability of 
anomaly tends to zero. This has the flavor of our result for R ^ C, but here, we expand 
the spectrum of trade-offs to the entire range < R < C. For R > C, the error exponent 
vanishes in the strong sense, i.e., not only does the probability of the undesired error event 
cease to decay exponentially, it actually tends to unity. In that sense, the threshold effect 
is manifested in a clear way. 

Having hopefully convinced the reader that the large deviations performance criterion 
is reasonable in the waveform communication setting considered here, there is considerable 



^The fact that exponentially small error thresholds are exceeded with exponentially small probabilities 
is rather remarkable. It is thanks to the fact that the modulator is subjected to optimization. By contrast, 
for amplitude modulation (AM), where the estimation error of the ML estimator has variance No/{2£) = 
l/(2Cr), we have Pr{|f/ - U\ > e'^'^} = 2(5(e-^^^/2CT) -^ 1 for every R > 0, and only for 7? = this 
probability decays exponentially. 

^Typically, in the case of anomaly, the estimate U falls in a random point away from U, and so, it makes 
sense to assign to all gross error events the same cost, as is done by the proposed metric. 



room for the speculation that this may not be the case with the MSE criterion, despite 
its popularity. The difficulty in capturing the threshold effect and in closing gaps between 
upper and lower bounds in this setting, as discussed above, may be attributed to the fact 
that the MSE does not distinguish between the small errors and the anomalous errors, 
which are so different in nature. Comments in the same spirit are made also in [21, p. 633, 
central paragraph]. 

We discuss several observations and implications of the above described basic result 
(Section 3) and several extensions (Section 4), including other channels, variable power, 
and the case of a multidimensional parameter vector U = {Ui, . . . , Ud)- In the vector case, 
our error exponent criterion becomes 

d 



E* (i?i , . . . , Rd) = lini sup 

T-5>oo 



lloginfPr(|j{l^i-^i|>e" 



(4) 



Ki=l 

where our earlier characterization, in terms of the reliability function, extends to 



E*{R,,..., Rd) = E{Ri + i?2 + . . . + Rd)- (5) 

One of the conclusions of this result is that there is an abrupt threshold effect in the 
dimensionality of the parameter vector: below a certain critical dimension, the probability 
of excess estimation error may still decay exponentially, but beyond this value, it must 
converge to unity. We also discuss several other implications of our results. 

As a closing remark, we should point out that the criterion of excess estimation error 
probability was briefly discussed also in [27, Section IV], where a lower bound was given in 
terms of the error probability of an M-ary detection problem with optimum signaling. This 
is similar to the line of thought here, however, there are several differences: (i) We consider 
a Bayesian setting where U is a random variable, as opposed to the worst-case excess er- 
ror probability, max„ Prjexcess estimation error|ti}. (ii) We allow an arbitrary modulator, 
rather than focusing on PPM specifically, (iii) We allow an exponentially vanishing error 
threshold, e~^"^ (as opposed to a fixed threshold in [27], corresponding to i? = 0) and ex- 
plore the entire spectrum of trade-offs between R and the excess estimation error exponent, 
which in turn is intimately related to the reliability function, E{R). (iv) As described in 
the previous paragraph, we also expand the scope in several directions, like the multidimen- 
sional case and other channels. We also provide some insights from the perspectives of the 
threshold effect as well as joint source-channel coding and the separation theorem. 



2 Problem Formulation and the Basic Result 

Consider the signal model 

y{t) = x{t,u)+n{t), te[0,T) (6) 

where x{t, u) is a waveform with power S, which is parametrized by n G lY C IR, and 
where n{t) is AWGN with two-sided power spectral density Nq/2. Considering an arbitrary 
representation of x{t, u) as a linear combination of orthonormal basis functions, then due to 
the power limitation, the length of the curve (locus) drawn by the vector of coefficients of this 
representation, {afc(u), k = 1,2, . . .}, as u exhausts U, must be finite (and in fact, no larger 
than e [21, Chap. 8]) in order to keep the anomalous error vanishingly small. It therefore 
makes sense to assume that U is a finite interval, which without loss of essential generality, 
will be taken to be the interval [—1/2, +1/2), as any other interval can be obtained under 
re-parametrization using a simple affine transformation. 

An estimator of u is any measurable mapping from {y{t), < t < T} into U. In order 
to avoid limitations on the class of estimators (e.g., unbiased estimators, etc.), we adopt the 
Bayesian setting, i.e., we assume that n is a realization of a random variable U, uniformly 
distributed over [—1/2, +1/2). The uniform prior is assumed merely for convenience and it 
expresses the fact that no value of u has any preference a-priori. Any other prior, which is 
bounded away from zero and infinity, can be used as well. 

A modulator with power S* is a mapping from U into a family of waveforms {x{t, •), < 
t < T}, whose power is exactly^ S, i.e., 

1 rT 



dt-x'{t,u) = S (7) 

J JO 

for all u £U. No bandwidth limitations are imposed on the waveforms in this family. 

For a given i? > 0, we are interested in characterizing the best achievable excess esti- 
mation error exponent 



E*{R) = limsup 

T-5>oo 



-;^loginfPr||?7-?7| > e"-^'^} 



(8) 



T 
where the infimum is over all modulator-estimator pairs as defined as above. 

We first provide a lower bound on the excess estimation error probability, that leads 
directly to a converse theorem concerning E*{R). 



^In Subsection 4.4, we relax the restriction that the power would be exactly S for all u, and we allow 
instead the power S{u) to vary with u, but we keep an average power constraint, E{S{U)} < S. 



Theorem 1 Consider the AWGN channel with noise power spectral density Nq/2. Let 
R > be given and let e > be arbitrarily small. For every modulator with power S and 
every estimator U: 



Pr{^\U-U\ >e-^^} > {I - e-'^) exp{-T[E{R - e) + o{T)]}, 



(9) 



where E{R) is the reliability function of the AWGN, defined as in eq. (2) and where o{T) 
designates a quantity that tends to zero as T ^ oo. Consequently, 

E*{R) < E{R). (10) 

While the lower bound in Theorem 1 applies, in principle, for every e > 0, quite obvi- 
ously, for T — 7- oo, the tightest lower bound is obtained as e — )■ 0, which yields an exponential 
decay rate of E{R). 



Proof. The proof is in the spirit of the derivation of the Ziv-Zakai bound [27] and the 
Chazan-Zakai-Ziv bound [3], but with M hypotheses (rather than 2), where M is exponen- 
tially large. Consider a given estimator U oiU and a given modulator {x(t, •), < t < T} 
with power S. For a given u G [—1/2, +1/2) and A > 0, let Pe{u-, A) denote the probability 
of error of the optimum (ML) detector for deciding among the M equiprobable hypotheses 

Ui-. y{t) = x{t,u + iA) + n(t), i = 0, 1, . . . , M - 1 

where it is assumed that u and A are such that u + iA, i = 0, 1, ... ,M — 1, are all in 
[-1/2, +1/2). First, it is argued that 

Pe(n,A) < 



1 

M 


Pr< 


u -u > 


A 
2" 


U = u\ + 


M~2 

A 1 


'\v-u\>^ 


U = u + iA\ + 


I — 1 

Pr< 


U - 
I 


— t 


U 


= 


= u + (m-i)a| 



(11) 

To see why this is true, observe that the r.h.s. can be interpreted as the probability of error of 
a suboptimum M-ary detector that is based on first estimating U hy U and then deciding 
on the hypothesis %{ whose corresponding grid point u + iA is nearest to U. Next, we 
further upper bound the first and the last terms of the r.h.s. by Pr{|t^ — U\ > A/2\U = u} 
and Pr{|f7 - U\ > A/2\U = u+ {M - 1)A}, respectively, which yields 

A 



M-l 



1 r - 

Peiu,A)<—J2^'{\U-U\> 
i=0 



U = u + iA 



(12) 
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grating both sides over u, we get 














/•+l/2-(M-l)A 

/ du-Pe{u,A) 

J-1/2 




< 


r+l/2-(M-l)A I M-1 , ^ 

/ dn-— > Pr |[/-C/| 
J -1/2 M ^^^ V 


A 


[/ = n + iA 


= 


1 *^~l /•+l/2-(M-l)A r 
— > W dn-Pr |C/-C/| 


A 


[/ = n + iA 


= 


1 *^-l /•+l/2-(M-l)A+JA r 
— > W du-Pr<^|f/- 
^^^ i=0 ^-l/2+iA V 


.i>f 


C/ = n| 


< 


1 *^-i /-H-i/a r A 

^ \^ / dn.Pr||?7 U\>^ 

M ^^0 7-1/2 I' 2 


u 


^"} 


= 


/•+1/2 r . A 

/ du-vxlm -u\> — 

7-1/2 I 2 


U = u\ 




= 


vA\u-u\>^]. 













(13) 

Now, let A = 2e"^'^ and M = e(^"')'^/2 + 1. Then, it is weh known (see, e.g., [19, p. 168, 
eq. (3.6.26) and Section 3.8], [7, p. 383, eqs. (8.2.49), (8.2.50)], [21, pp. 345, eq. (5.106c)]) 
that 

Pe(n,A)>e-^[^(^-^)+°(^)l, (14) 

which, when substituted into the left-most side of (13), readily gives 

/.+l/2-(Af-l)A 

Pr{|f/-C/|>e-«^} > / dt/ • e-^[^(«-^)+°(^)l 

= [1 _ (M - i)A]e-^[^(^-^)+°(^)] 

= ^-y_^-eTy-T[EiR-e)+oiT)]^ (^5) 

completing the proof of Theorem 1. □ 

Our next theorem. Theorem 2, provides a compatible achievability result. 

Theorem 2 Consider the AWGN channel with noise power spectral density Nq/2 and let 
R > be given. Then there exists a modulator with power S and an estimator U for which 



Pr{|^-[/|>e-^^}<e-^(«)^. (16) 

Consequently, the limsup in eq. (3) is equal to the liminf (i.e., the limit exists) and 



E*{R) = E{R). 



(17) 



Proof. We first describe the modulator and estimator. Assume, without essential loss 
of generality, that e /2 is integer (otherwise, alter the value of R slightly to make it 
such). The modulator first quantizes the parameter u to the nearest point in the grid 
{-1/2 + e"-^^, -1/2 + 3e-^^, -1/2 + Be'^'^, . . . , 1/2 - e"^^}. This grid, which consists of 
M = e^-^/2 points, is mapped into a set of M orthogonal signals, each with power S. Let 
i{u) denote the index of the grid point nearest to u and let Xi{t) be the signal corresponding 
to the i-th grid point, i = 1, 2, . . . , M. Then the modulator is defined by 

x(t,n) = Xi(„)(i). (18) 

Let i denote the output of the ML decoder for the signal set {xi{t)}fi-^^, namely, 

i = argmaxi<j<jvj / Xi{t)y{t)dt. (19) 

— JO 

Then, the estimator u is defined as the corresponding grid point, i.e., 

n = -- + (2i-l)e-«^. (20) 

Clearly, for this particular modulator-estimator pair, the event {|i/ — f7| > e~ } implies 
i 7^ iiU), namely, an error in decoding the index i of the transmitted signal Xi{t). The 
probability of excess estimation error is therefore upper bounded by the probability of error 
for M = e'^-^/2 orthogonal signals, each with energy £ = ST, which is well known (see, e.g., 
[19, p. 67, eq. (2.5.16)] or [7, p. 381, eqs. (8.2.43), (8.2.44)], [21, pp. 344-345, eqs. (5.104)- 
(5.106b)]) to be upper bounded in turn by e~^[R~v^^'^)/T\T ^ ^-E{R)T ^ This completes the 
proof of Theorem 2. □ 

The Case i? = 

Theorems 1 and 2 refer to the case i? > 0. The case i? = should be treated with 
caution as there is an inherent discontinuity of the operational reliability function at i? = 0. 
As is well known, the operational reliability function for the infinite-bandwidth AWGN 
channel, which is defined as the asymptotic error exponent of the optimum rate-i? code 
for this channel, agrees with E(R), given in (2), only for R > 0. Concerning the point 
R = 0, there is a difference between the strong sense of this assignment, where the number 
of codewords M is fixed (independent of T), and the weak sense, where M grows (but in 
a sub exponential rate). This is because for fixed M, the error exponent of the best signal 

9 



set (the simplex signal set) is determined by the minimum distance, which depends on M 
according to dmin = 2M8 /{M — 1), where again, £ = ST is the energy of all M signals. 
The error probability of the optimum code then decays according to exp[— T^ • jyjrr]) which 
agrees with £'(0) = C/2 only when M grows without bound. 

Correspondingly, there is a parallel difference between the case where the error threshold, 
A/2 (in the proof of Theorem 1) is fixed, as opposed to the weaker sense where A is allowed 
to vanish as T grows, but in a subexponential rate. Theorems 1 and 2 hold for i? > 0, 
and the limit i? — )■ corresponds to the weaker meaning. What can be said about the 
stronger meaning? Repeating the proof of Theorem 1, but with a zero-rate lower bound on 
Pe(u,A) [19, p. 174, eqs. (3.7.2)-(3.7.5)], [21, pp. 345, eq. (5.106c)], we have (by choosing 
M=L1/AJ) 



Pr{|^-^|>t}>^(l + A-ALl/AJ).g(^A._L^j. (21) 

In the limit of T — t- oo, this lower bound is of the exponential order of 

r CT [l/AJ 
'''"'i--^Il7Al^2 

As an upper bound we have, by a compatible upper bound on the probability of error (see 

proof of Theorem 2), the following: 



Pr{|t/-t/|>f}<(ri/Al-l,.«(^^^^I^). (22, 

which simply follows from the union bound on the probability of error in the detection of 
one out of M = [1/A] simplex signals with energy £ = ST. Here, the exponential behavior 
is according to 

CT ri/Ai 

exp ' 



2 [l/Al-l, 

While there is a gap in the error exponents for every finite A, this gap vanishes as A — )■ 0, 

thus the best achievable asymptotic value of 

,. ,. r \nVT{\U -U\ > A/2}' 
lim lim — 

A-5>OT-5>oo T 

is still ^(0) = C/2. 

In this context of large deviations for fixed A, it is appropriate to mention also the 
relation with the MSE criterion: The two criteria are easily related via the identity 

E{tj -Uf = 2 [ dA • A • Pr{|C/ -U\> A}, (23) 

Jo 

10 



and so, the MSE can be lower bounded via any lower bound on Pr{|[/ — U\ > A} for all 
A in the appropriate range, which is exactly the line of thought that guides the Chazan- 
Zakai-Ziv bound [3] for two hypotheses. Here, as we consider M hypotheses rather than 
two,*^ and M is exponentially large, the integration range of A in the corresponding lower 
bound, where the integrand is Pe{u,A), must be limited to the interval (0, 1/(M — 1)], as 
otherwise, some grid points {u + iA} (in the proof of Theorem 1), would fall outside the 
interval [—1/2, +1/2). This limitation on the range of A causes the resulting lower bound on 
the MSE to be relatively weak. One of the main points in this paper is that by considering 
the large deviations performance as our figure of merit in the first place, we actually avoid 
the need to integrate over A altogether. An interesting open question, in this context, is 
whether it is possible to devise a modulator-estimator pair, which would be independent of 
A, but yet achieve asymptotically optimum large deviations performance for all A in the 
interesting range. Such an estimator may also achieve asymptotically optimum MSE, in 
view of eq. (23). 

3 Discussion 

In this section, we pause to discuss a few observations, implications, and modifications of 
Theorems 1 and 2. 

3.1 Strong Converse and the Threshold Effect 

The case R = 0, discussed in Section 2, is one interesting extreme of the range of R. The 
other extreme is the point R = C, where E{R) vanishes. Here, due to the strong converse 
to the channel coding theorem, E{R) vanishes in the strong sense for R > C, namely, the 
probability of error tends to unity. Owing to the proof of Theorem 1, the large deviations 
estimation performance criterion, considered in this paper, 'inherits' this strong converse, 
and then the probability of excess estimation error tends to unity as T — )■ oo, for R > C. 
This means an abrupt threshold effect in the limiting probability of excess error, from to 
1, as i? crosses C. 



*Here, two hypotheses correspond to antipodal signals, rather than orthogonal signals, and hence lead to 
non-tight exponential error bounds with a loss of 3dB. 
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3.2 Achievability by Other Schemes 

As mentioned in the Introduction, alternative achievability proofs are possible by analyzing 
FPM and PPM systems. The FPM modulator (see, e.g., [21]) is defined as follows: 

x(t,n) = \/25cos[27r(/o + n- A/)i], (24) 

where in our case, both the central frequency /o and the frequency offset A/ (A/ <^ /o) are 
taken to be proportional to e^-^. For the ML estimator U, in this case, Pr{|t/ — C/| > e~^^} 
is the probability of anomaly, which is essentially e~ ^ ' (see [21, eqs. (8.175a)-(8.175c)]). 
Another good modulator for our purpose is PPM, where 

x{t,u) = s[t-{u + l/2){T-T)], (25) 

s[-] being a pulse whose support is [0,r], where r is proportional to e^^'^ (and hence the 
bandwidth is proportional to e ), and again, the large deviations event in question is the 
anomaly event (see, e.g., [13], [25] for more details). 

3.3 Relation to Bounds on Moments of the Estimation Error 

The combination of Theorem 1 with Chebyshev's inequality, 

Pr{|^-C/|>e--}<:^i^ (26) 

yields the following lower bound on the MSE 

E{U - Uf > (1 - e-'Ty-T[E{R-e)+2R+o{T)] ^ (3?) 

which is tightest for i? = e— J-O, asT— J-oo. Thus, the MSE is lower bounded by an 
expression whose exponential order is e'^^^' = e~ ", as discussed in the Introduction 
(see also [14]). The same comment applies, of course, to more general moments of the 
estimation error, E\IJ — C/|", in the range a > 1 (see also [27, p. 388, Remark 1]). For 
< a < 1, the best choice of R is near R = C/(l + af' and the resulting lower bound is of 
the exponential order of exp[— aCT/(l + a)]. 

3.4 Relation to the Joint Source— Channel Excess Distortion Exponent 

Note that if we think of the random parameter [/ as a source variable, and then the 
modulation-estimation problem is considered as a joint source-channel coding problem, 

12 



then our conclusion from Theorem 2 is that separate source- and channel coding is asymp- 
totically optimum in our setting: In the modulation scheme analyzed in the proof of Theorem 
2, the transmitter first uses a source encoder that quantizes the parameter U, applying a 
simple uniform scalar quantizer ~ see also [15], and then maps the quantized version of U 
into a channel input waveform using a good channel code. The same comment applies to the 
case where the parameter is a vector U = (Ui, . . . , Ud), as will be discussed in Subsection 
4.1, where the source encoder will quantize each component Ui individually. 

It is interesting to contrast this with the results of Csiszar [5] (see also [4]), where 
exponential rates of probabilities of excess end-to-end distortion between a source vector 
and its reconstruction vector were studied under a joint source-channel coding setting.^ 
In that work, it was argued that, in general, separate source- and channel coding is sub- 
optimum in the error exponent sense (see discussion at the second to the last paragraph in 
the Introduction of [5] as well as in [4, Introduction] and [7, Problem 5.16, pp. 534-535]). 
The natural question that arises is how do these two (seemingly contradicting) facts settle, 
if there is any contradiction. First, observe that there are some differences between our 
setting and the one in [5]: 

1. In our setting, the source variable [/ is a scalar, namely, it remains of "block-length" 
1, when T goes to infinity, whereas in [5] the analogous quantities grow together 
with a fixed ratio (which is known as the bandwidth expansion factor). Even in 
Subsection 4.1, where as mentioned earlier, we extend our setting to the case of a 
vector parameter, U = (Ui, . . . ,Uii), the dimension d will be assumed fixed while 
T — )■ CO. 

2. As another difference in the asymptotic regime, in our case, the allowed distortion 
threshold decays exponentially, whereas in [5] it is fixed. 

3. For the AWGN with infinite bandwidth, the reliability function is fully known, as 
opposed to that of a general DMC. 

Nonetheless, in spite of these differences, our results can be understood in the framework of 
[5]. It turns out that while in general, there is no separation theorem for error exponents. 



^In other words, instead of analyzing the performance of the communication system under the criterion 
of average distortion, it was analyzed in [5] under the probability that the block distortion would exceed a 
certain threshold in the large deviations regime. 



13 



the parameter modulation-estimation problem considered here is analogous to a special 
case, where a separation theorem holds true for error exponents nevertheless. 

To be more specific, Csiszar's main result in [5] can be presented essentially as follows: 
The best excess distortion exponent of joint source-channel coding is upper bound by 



e{D) = mm[F{D, R) + E{R)], (28) 

R 



where 



F{D,R)= min D{Q'\\Q) (29) 

{Q': R{D,Q')>R} 

is Marten's source coding (excess distortion) exponent of the source Q [12], R{D,Q') being 
the rate-distortion function of a source Q' , and E{R) is the reliability function of the 
channel. Now, consider the source Q* that maximizes R{D, Q) (which is the uniform source 
in many cases, in analogy to our continuous-valued uniform source U). For this source, 

;,/nm_/0 R<R{D,Q*) 

^(^'^)-\ oo R>R{D,Q*) ^^^> 

This is the case where the entire source space can be fully covered by spheres of noramlized 
radius D. In this case, the minimization range in the expression of e{D) obviously reduces 
to the range R < R{D, Q*), where the contribution of the source coding exponent vanishes 
and hence we are left with e{D) = E[R{D, Q*)]. This can be seen as follows: 

e{D) = m.m[F{D,R) + E{R)] 

R 

= min [0 + E(R)] 

R<R{D,Q*) 

= E[R{D,Q*)]. (31) 

We now argue that this is a case where separate source- and channel coding happens to 
be optimal: If the source sequence space is fully covered by spheres of radius D, the source 
encoder contributes nothing to the excess distortion event and so, excess distortion may hap- 
pen only in the event of a channel error whose exponent is E{R), computed at R = R{D, Q*), 
which is exactly the above mentioned expression of e{D). Indeed, from the mathematical 
point of view, the source-channel excess distortion exponent pertaining to separate source- 
and channel coding, denoted by CsepiD), and given by sup^min{F(D, i?), £^(i?)}, is also 
equal to E[R{D,Q*)] in this case. This is easily shown as follows: 

esep{D) = supmm{F{D,R),E{R)} 
R 
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j R<R{D,Q*) 

Tl E{R) R>R{D,Q*) 

E[R{D,Q*)]. (32) 



This is clearly analogous to our case: We fully cover the unit interval with small intervals of 
size 2e~ using a rate-i? source code. Similarly, in the d-dimensional case to be described 
in Subsection 4.1, we perfectly cover the unit cube by boxes of sizes 2e^^ x . . . x 2e~ '' 
using a code of rate i?i + . . . + Rd- 

4 Extensions 

In this section, we extend Theorems 1 and 2 in several directions (one at a time). These 
include the multidimensional case, more general channels, and allowing a variable power 
that depends on the parameter. 

4.1 The Multidimensional Case 

The extension to a multidimensional parameter vector is conceptually quite straightforward. 
Suppose now that the parameter is a vector u = {ui, . . . ,Ud) G [—1/2, +1/2)^^, which 
is a realization of a random vector U = {Ui, . . . , Ua), uniformly distributed over the d- 
dimensional unit hypercube [—1/2, +1/2) . Consider now the probability 

d 



Pr 



U=l 

Then, here both in the upper bound and the lower bound, the d-dimensional unit cube is 
divided by a Cartesian grid with about e^*"^ points in each dimension, i = 1,2, ... ,d, thus 
a total of eiRi+R2+---+Rd)T pQ^j^ts, which means an effective rate oi R1 + R2 + ■ ■ ■ + Rd- More 
precisely, the lower bound is now given by 
d 



Pr 






> il-e-''yexp{-T[E{Ri+R2 + ...+Rd-ed)+o{T)]} (33) 



since the integration in eq. (13) now becomes d-dimensional. In the upper bound, we are 
again quantizing and transmitting one of e(^i+^2+---+«d)r orthogonal codewords, the one 
which represents the corresponding quantization cell. Thus, the probability of the undesired 
event in question is of the exponential order of e^^'^i+^2+---+«d)r^ Considering the case 
Ri = R for all i G {1, 2, . . . , d} (hence X]j Ri = R- d), there is then an interesting threshold 
effect in the dimensionality of the problem: For i? = (in the weak sense), the exponential 
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rate of decay of the probability of the large deviations event Llf^i{\Ui — Ui\ > e"''''^} is 
essentially -E(O) = C/2, independently of d. For R > 0, the behavior is as follows: As long 
as 

d<dc= \C/R\ , (34) 

the probability of the event uf^i{\Ui — Ui\ > e~^^} tends to zero as T — )■ oo. But when d 
exceeds dc and hence the effective rate R-d exceeds C, the probability tends to unity. Thus, 
dc is a critical dimension in this sense. This abrupt transition from to 1 in the limiting 
probability of excess error is another aspect of the threshold effect. In most estimation prob- 
lems we normally encounter, the estimation performance degrades with the dimensionality 
(an effect known as the "curse of dimensionality" ) , but usually the degradation is graceful 
and not abrupt as here. 

All this discussion can be extended, in principle, from Cartesian lattices in the parameter 
space to general lattices, where the undesired excess error event is defined as the event where 
the estimated parameter vector falls outside the respective Voronoi cell centered at the true 
parameter vector. Here, the effective rate to be used as the argument of the reliability 
function is determined by the normalized logarithm of the ratio between the volume of the 
source vector space and the volume of a basic cell. 

4.2 Other Channels 

The assumption of an AWGN channel with unlimited bandwidth was not used very strongly 
beyond the fact that for this particular channel, the reliability function is fully known for the 
entire range of rates, < R < C. But the reliability function is also known for the Poisson 
channel with unlimited bandwidth [22], [23]. Here too, the idea would be to first quantize 
the parameter and then to use a good code for the Poisson channel, that asymptotically 
achieves the reliability function, e.g., the Wyner code (see also [15]). Similar comments 
apply also to more general channels in the limit of the infinite bandwidth regime [8] . 

In the discrete-time case, the reliability function may not be known for the entire range 
of rates, but it is known for all rates above the critical rate, where it is also achievable 
by random coding. Moreover, even if the channel is not fully known, we can derive a 
universal estimator that relies on a universal decoder for memoryless channels (see, e.g., [6] 
and references therein), on the basis of the proof of the achievability in Theorem 2. But 
even at rates below the critical rate, where the reliability function is not known, the basic 



16 



principle of optimum modulation-estimation using a separation-based scheme continues to 
hold: First quantize U uniformly and then apply an optimum channel code. 

The modification of our results to discrete memoryless channels also enables to handle, 
at least partially, the case of the AWGN channel with limited bandwidth. This is because 
the case of limitation to finite bandwidth W is asymptotically equivalent to the discrete 
memoryless Gaussian channel with N = 2WT channel uses (pertaining to A^ = 2WT 
orthonormal basis functions that span the subspace of allowable signals). In this case, E{R) 
for high rates agrees with the sphere-packing bound, which in the Gaussian band-limited 
case is given by 



Esp{R) = max <^ pW In 



1 + 



S 



NqW{1 + p) 



pR\. 



(35) 



The critical rate beyond which Esp{R) = E{R) is given by 



Rc{W) 



d r 



1 + 



S 



w 



In 1 + 



NqW{1 + p) 
S \ 1 



p=l 



s 



(36) 



2NqWJ 2 S + 2NqW 
where the maximum over p in achieved within the interval [0, 1]. 

4.3 The AWGN Channel With Rayleigh Fading 

Another important channel model is the AWGN channel with Rayleigh fading. Here, the 
signal model is 

y{t) = a-x{t,u)+n{t), tG[0,T) (37) 

where a is a realization of a Rayleigh random variable A, whose pdf is given by 

/A(a) = 4e"^' a>0- (38) 



It is assumed that A is independent of [/, as well as of the noise {n{t), < t < T}. It is 
instructive to examine the best achievable behavior of the probability of excess estimation 
error under this fading model. 

For a given A = a, the received signal has power a'^S, which implies that the channel 
capacity is oP'S/Nq = a?C. Correspondingly, the reliability function is given by 



Ea{R) 



CI -2 — R, 



0< R<a'^^ 



{aVC - ^/i?)2, a^^<R< a^C 
0, R> a^C 



(39) 
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Equivalently, if we think of Ea{R) as a function of a parametrized by i?, then 



Ea{R) = { {aVC - /R)2, J§<a< 2 Jg (40) 



In view of Theorems 1 and 2, averaging the upper and lower bounds on the probabihty of 
decoding error given a, would yield respective bounds for the fading channel. For the lower 
bound, this averaging is legitimate as it corresponds to a receiver that is informed of the 
realization a of the random variable A. For the upper bound, this is legitimate too since 
the ML decoder does not depend on (the possibly unknown value of) a in the regime of 
equal-energy signals considered here. 

As before, one should distinguish between the cases R > and i? = (in the strong 
sense). The following two results are shown in Appendix A. For the case R > 0, the 
probability of excess estimation error is essentially equal (for large T) to the probability of 
channel outage, which is 

Ft{A < ^R/C] = 1 - e^^/2C^ (41) 

where C = a^C designates the average capacity of the channel. In other words, there is 
no decay as T — t- oo. For R = 0, the best achievable probability of excess estimation error 
decays at the rate of 1/T rather than exponentially with T. 

4.4 Variable Transmission Power 

In Section 2, we have restricted the class of modulators in a manner that the power of 
the transmitted signal, {x{t,u), < t < T}, is always S, independently of u. Consider 
the somewhat broader setting, where the power of {x{t,u), < t < T}, denoted S{u), is 
allowed to depend on u, and we only limit the average power according to 

/•+1/2 

E{S{U)} = du- S{u) < S. (42) 

We argue that our results apply to this wider class of modulators as well. 

Concerning the achievability, we continue to use the same modulator and estimator as 
in the proof of Theorem 2, where the power is S{u) = S for every u. The proof of Theorem 
1, on the other hand, has to be extended to allow variable power. The point is that the 
proof of Theorem 1 in Section 2 relies heavily on the lower bound on the probability of error 
in M-ary signal detection, which in [19, Section 3.6.1], is derived under the assumption of 
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equal-energy signals, and we are not aware of an existing extension of this result to allow 
sets of signals with different energies, where the limitation is on the average energy only. 
In Appendix B, we extend the proof of Theorem 1 to accommodate a given average energy 
constraint, or equivalently, an average power constraint (42). In a nutshell, the intuition is 
that when some of the signals have higher power and some have lower power, the probability 
of error is basically dominated by the those with the lower power, which is, of course, smaller 
than the average S. Thus, variable power signal sets offer no improvement relative to fixed 
power signal sets in terms of achievable error exponents. 
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Appendix A 

In this appendix, we derive the results for the fading channel for the case R > and the 
case R = 0. 

Consider the case R > first. Here, there is a positive probability that a would be small 
enough that the corresponding capacity a^C would fall below the given R, which is exactly 
the event of channel outage. This happens with probability 

Fr{A^C <R}= f^^^da ■ ^e-'^'/^.^ = 1 - g-^/^-^^ = i _ e-«/2C_ (^.1) 

Jo cr 

Owing to the discussion in Subsection 3.1, in the event of outage, the probability of excess 

estimation error is very close to unity, and so, the overall probability of excess estimation 

error is essentially lower bounded by the outage probability, i.e., 

Pt{\U -U\> e-^^} > [1 - o{T)] ■ (1 - e~^/^^), (A.2) 

that is, the probability of excess estimation error no longer decays as T grows without 
bound. Concerning the upper bound, we have from eq. (40) 



Jo (y^ 



-TEa(R) 



''^''da.4e-V-^ + 



cr^ 
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2y/R/C 
R/C 

da ■ —^e 



J^/rIc (y 



"- -aV2a2 -T(a2C/2-R) 



2y/R/C 

= i-e-^/^^ + o{T), (A.3) 

where the last hne follows from the fact that the above two last integrals, over the ranges 
[^/R/C, 2y^R/C) and [2-\/i?/C, oo), both vanish as T — t- oo, as can easily be shown. Thus, 
the lower bound and the upper bound asymptotically coincide. 

As for the case R = (i.e., A fixed, but small), then in view of the derivations in Section 
2 (see eqs. (21) and (22)), for a given a, both the upper bound and the lower bound on 
the probability of excess estimation error probability admit the form a ■ Q{a\/CTf3), where 
a and /3 are constants. The respective constants, a and /?, pertaining to the upper bound 
and the lower bound, are different. However, a is just a multiplicative constant, which is 
of secondary importance here, because we are primarily interested in the rate of decay of 
both bounds as T — )■ oo. On the other hand, /3 is very close to unity in both bounds when 
A is small. Thus, the quantity of interest is basically the expectation of Q{A\^CT) w.r.t. 
the randomness of A. We next show that for large T, this quantity is well-approximated 
by 

, /"OO „ 9,9 , 1 

E{Q{AVcf)} = j^ da^e-- /''^ Q{aVcf) « ^^, (A.4) 

that is, the minimum achievable excess estimation error probability decays algebraically 
rather than exponentially. Using Craig's formula (see, e.g., [17]), 



1 f^/2 



X 



2 



Q{x) = - d^exp -— ^ , (A.5) 

vr Jo \ 2 sm 6 / 



we have the following: 



E{Q{A-Vcf)} = r^.ae-'''/^^'Q{a-VCT) 
Jo cr 



cj2 ttJo \ 2sin2^ 



1 r/^ d9 



CT 

1 + 



sm^ I 



TT Jo l + CT/sin'^e 
1 r/2 d^-sin^e 



TT Jo CT + sin^ e 
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(A.6) 



The last expression can be upper bounded and lower bounded by bounding the sin 9 term of 
the denominator by and 1, respectively. Both bounds are well approximated by 1/(4C'T) 
for large T. 

Appendix B 

In this appendix, we provide an outline of the extension of Theorem 1 to the variable power 
case. 

For a given u and A, consider again the grid {u + iA}-^^ , which is assumed to lie 
entirely in [-1/2, +1/2), and let A = le'^^ and M = e(^~^)^/2 + 1, as before. Let 
'S'min = niiuu S{u) and S'max = maxu S{u). We first argue that for modulators whose power 
function S{u) is continuous (or at least, left- or right-continuous) in the vicinity of its 
minimum, the assertion of Theorem 1 is rather straightforward in the range Cmin < R < C, 
where Cmin = 'S'min/-^o- The reason is that the grid points, {u+iA}-^Q , where u is near the 
minimum of the power function (and so are all other grid points, with the above assignment 
of M and A), constitute a signal set whose rate, R — e, is very close to (or even exceeds) its 
capacity, which is about Cmin; since all signals in this grid have power near S'min- Thus, this 
grid dominates the probability of error (and hence also the probability of excess estimation 
error) and it dictates a sub-exponential decay at best, which is trivially lower bounded by 
the exponent exp[—TE{R)]. In view of this, we shall confine attention throughout to the 
range of rates < i? < Cmin- 

Now, consider the partition of the range of powers [S'min, S'max] into small bins of width 
5, where S is assumed to divide S'max — S'min- For a given u, let Ci denote the subset of 
integers {j} for which S{u + j A) falls in the i-th bin, that is, 

Snun + iS< Siu + jA) < 5min + (^ + l)S, i = 0, 1, . . . , r - 1 (B.l) 

where r = (S'max — S„un)/^- First, observe that 

Pe(n,A)>X:^^e(C.), (B.2) 

j=0 

where Pe{Ci) is the probability of error pertaining to the subset of signals {x{t,u + jA)}j^Ci 
alone. The reason for the inequality is that error events associated with confusion between 
pairs of signals that belong to different bins are not counted in the r.h.s. Next, for a given 
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e > 0, let Xg denote the index set {i : \Ci\ > e M}. Then, obviously, Pe{u,A) is further 
lower bounded by 

Pe{n,A)>J2^-^Pe{C^). (B.3) 

Now, let us slightly alter the powers of all signals in Cj to be Si = S'min + (i+l/2)(5, neglecting 
the effect that this may have on the exponent of Pe{Ci).^ Let us denote here the reliability 
function of a rate-i? code with power S by E{R,S), to emphasize the dependence on the 
power (via the dependence on the capacity). Then, for every i £ I^, we have 

^e(C.)>e-^[^(^-2^'^')+°(^)l, (B.4) 

since the size of Cj is of the exponential order of at least e(-R-2e)T_ ^^gQ^ ^g^ ^g denote 

vr. = ^ '^^' I - (B.5) 



Then, 



P^{n, A) > E §^ • E vr.e-^[^(«-2^'^')+°(^)]. (B.6) 

As for the first factor on the r.h.s. of (B.6), we have 

«eie «eij iSle 

and so, this factor is lower bounded by (1 — re""^"^). Now, observe that the function 
^-TE{R-2e,s) jg convex^ in S for all T > VR/[2,/C~{,/C~ - VR)^]. It follows then 
from (B.6) that 



Pe{u,A) > (1 - re"'' ) • exp 



(B.^ 



-TE\R- 2e, J2 ^^S^ + o{T) 

Next, we need an upper bound on X^ieX T^iSi- This is accomplished as follows: 

, M-l 

i=0 



®The lower bounds on the probability of error of M equal-energy signals are straightforwardly extended 
to allow almost equal powers (within ±5/2), with only a small degradation in the exponential rate, which 
depends on 5. 

^The function e^"^'^^' is convex m x & X whenever / is twice differentiable and T > 
s^PaieA- /"(^)/l/'(2^)P' ^^ '^^^ easily be seen from the second derivative of e'"^^^ \ An alternative con- 
sideration is that for large T, the average of e^'^'^^' is dominated by e^"^'" ^'=-^ fw ^ and that infj^gA- f{x) 
is smaller than the average of f{x) over X. 
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and so, 



iex^ 



Thus, from (B.8), we have 

Pe{u,A) > (1 - re"'^) • exp 



S{u) + 5/2\ 



-TE\R- 2e, 



+ o{T) 



(B.IO) 



(B.ll) 



1 — re"*^^ 

Finally, we integrate both sides of the last inequality w.r.t. n, in order to relate it to the 
probability of excess estimation error, as in the proof of Theorem 1. To this end, we first 
observe the following: 



l/2-(M-l)A 
■1/2 



du ■ S{u) 



M-l 



l/2-(M-l)A I 



-1/2 
1 

M 



M 



j=0 



1 A^-l /■l/2-(Af-l)A 

V / du-5(u + zA) 

1 ^-1 /•l/2-(Af-l)A+iA 

^ to ^-1/2+iA ^ ^ 

1 M-l 1/2 

"1/2 

du • S{u) 

-1/2 

< s, 

and therefore, for the above defined assignments of A and M, we have: 



1 /•1/2-e-- _ 5- 
Tr I du • Siu) < 7^. 

1 - e-^^ 7-1/2 ^ ^ - 1 - e-^^ 



(B.12) 



(B.13) 



Thus, 



> 



Pr{|C/-C/| >e-^^} 

du-Pe(u,2e--^^) 

-1/2 



-eT\ 



> (l-re""')(l-e 



> (l-re"^'^)^exp 



cTn 



1/2-e- 



dt( 



-1/2 

-TE; (r - 2e. 



■ exp 



1 - e-'T 

S + 6/2 



-TE \R-2e. 



S{u) + 5/2 



1 — re 



-eT 



(l_^e-.T)(i_e-.T) 



+ o(r) 



(1 _ ^^-eT^2^-T[E(R,S)+o'{T)\^ 



+ o{T) 



(B.14) 



where in the last line, o'{T) means another function (other than o[T) of the previous lines) 
that tends to as T — )■ oo, which is obtained by letting e and 5 tend to zero as T — )■ oo at 
the appropriate rates (e.g., e = 6 = 1/vT)- 
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